-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear all documents without dropping schema #33
Clear all documents without dropping schema #33
Conversation
@@ -294,7 +294,8 @@ defmodule ExTypesense.Collection do | |||
@doc """ | |||
Permanently drops a collection by collection name or module name. | |||
|
|||
**Note**: dropping a collection does not remove the referenced alias, only the indexed documents. | |||
**Note**: dropping a collection does not remove the referenced | |||
alias, only the indexed documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is this really correct? We drop the collection altogether right, not just the indexed documents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shijithkjayan yes everything is removed when collection is dropped except alias (you have to explicitly delete the alias).
Here are the details:
- dropping collection will also delete all documents
- dropping collection will not affect alias
- removing all documents will not drop the collection
The reason why alias is not removed when dropping collection is because of this example scenario:
A new collection (including its documents) was created and it replaces the dropped collection and points it to the existing alias.
- This is especially helpful when you need to update the schema of a collection. You'd create a new timestamped collection, re-index your data in it and then swap the alias to point to it. source
- You want to use the Collection Alias feature to do a reindex without any downtime into a new collection and then switch the alias over to the new collection. source
see test/collection_test.exs
for the updates. LMK your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete_by_query
feature LGTM
def delete_all_documents(conn \\ Connection.new(), collection_name) | ||
|
||
def delete_all_documents(conn, collection_name) when is_binary(collection_name) do | ||
delete_documents_by_query(conn, collection_name, %{filter_by: "#{collection_name}_id:>=0"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: In my case, the ID field of a document is called id
and not {collection_name}_id
. And I believe in most cases the collection name will not be prefixed to the id
field name. So is this a conscious decision you made to prefix the collection name here?
Also, in the github comment thread you attached in the function doc, it says that id>=0
will not work as the ID is a string field. I am sorry I missed this out while reviewing the changes. What do you think can be done here instead? @jaeyson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a query that can work is %{filter_by: "id:!=0"}
. This way it doesnt matter that the id
is not an integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I'll make changes to it.
In my case, the ID field of a document is called id and not {collection_name}_id. And I believe in most cases the collection name will not be prefixed to the id field name. So is this a conscious decision you made to prefix the collection name here?
The prefixed id is used from using search that returns an Ecto query (see: https://github.com/jaeyson/ex_typesense/blob/main/lib/ex_typesense/result_parser.ex#L1-L31)
it says that id>=0 will not work as the ID is a string field.
Yes, you're correct. Id in Typesense is a string, AFAIK the returned JSON response is like this (I'm basing on the docs and tested it):
{
"id": "124",
"company_name": "Stark Industries",
"num_employees": 5215,
"country": "USA"
}
To delete all documents in a collection, you can use a filter that matches all documents in your collection. For eg, if you have an int32 field called popularity in your documents, you can use filter_by=popularity:>0 to delete all documents
So the change here is from %{filter_by: "#{collection_name}_id:>=0"}
to %{filter_by: "id:!=0"}
just like you mentioned earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shijithkjayan I forgot this part:
To delete all documents in a collection, you can use a filter that matches all documents in your collection. For eg, if you have an int32 field called popularity in your documents, you can use filter_by=popularity:>0 to delete all documents.
That's one of the reason why I used {collection_name}_id (type int32) instead if id (type string).
There's a thread discussion where a wildcard is not yet added as a feature:
https://threads.typesense.org/t/deleting-typesense-documents-with-specific-field-using-filter-by-/2J3912e
One problem here is if your PK in Ecto schema is a type of UUID (<-- I'll add this in my todo list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if its UUID, %{filter_by: "id:!=0"}
will still work right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id:!=0
@shijithkjayan I'm afraid not. Encountered an error by specifically doing that (id:!=0
).
see this CI test: https://github.com/jaeyson/ex_typesense/actions/runs/10260778543/job/28387381407
This error was discussed here, and another one here:
# this error only pops-up on versions below v26
`{:error, "Not equals filtering is not supported on the `id` field."}` <-- `!=` not supported
If you read this thread https://threads.typesense.org/2J3912e:
Can i delete that documents using
filter_by: "user_id:*"
It's not possible to do a wildcard match on a field with filter by.
That's the reason why id (or another field) won't work at the moment when using wildcard for filter_by
.
uuid
Even if its UUID
The reason why I brought up uuid
is because currently I'm prefixing collection name (table name in Ecto) in _id
and using it as default_sorting_field
Basing on their docs https://typesense.org/docs/26.0/api/collections.html#schema-parameters, it says in default_sorting_field
:
The name of an int32 / float field that determines the order in which the search results are ranked when a sort_by clause is not provided during searching.
So I'm not sure if a default_sorting_field
with type string (uuid
) would work in this case.
Hope this answers your questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually able to clear all documents in my collection with %{filter_by: "id:!=0"}
🤔
ExTypesense.delete_documents_by_query(collection_name, %{filter_by: "id:!=0"})
--> This is what I used and it cleared everything with no errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shijithkjayan what typesense version are you using?
This error {:error, "Not equals filtering is not supported on the
id field."}
is from using Typesense v0.25.2 on the test run result https://github.com/jaeyson/ex_typesense/actions/runs/10260778543/job/28387381407
The reason I don't want to continue to make on that change (id:!=0
) is because it's not supported on v0.25.2.
I wanted this library to be backwards-compatible like atleast 1 or 2 versions below. I'll gladly change this based on your request once the version of Typesense bumps up, and not breaking on previous versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍 Thanks for clarifying
Implements #28
Problem
Based on the Typesense issue, it seems that the only way to delete all documents in a collection is to simply delete the collection and recreate it.
Workaround
Implement a logic, preferably a catch-all filter if no query
filter_by
is passed in function.So, 2 possible ways to use the function: